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Abstract 

The evolution of cooperation has been a perennial problem 
for evolutionary biology because cooperation is undermined 
by selfish cheaters (or "free riders") that profit from cooper- 
ators but do not invest any resources themselves. In a purely 
"selfish" view of evolution, those cheaters should be favored. 
Evolutionary game theory has been able to show that under 
certain conditions, cooperation nonetheless evolves stably. 
One of these scenarios utilizes the power of punishment to 
suppress free riders, but only if players interact in a structured 
population where cooperators are likely to be surrounded by 
other cooperators. Here we show that cooperation via pun- 
ishment can evolve even in well-mixed populations that play 
the "public goods" game, if the synergy effect of coopera- 
tion is high enough. As the synergy is increased, populations 
transition from defection to cooperation in a manner remi- 
niscent of a phase transition. If punishment is turned off, 
the critical synergy is significantly higher, illustrating that (as 
shown before) punishment aids in establishing cooperation. 
We also show that the critical point depends on the mutation 
rate so that higher mutation rates discourage cooperation, as 
has been observed before in the Prisoner's Dilemma. 



Introduction 

"Tragedy of the commons" is the name given to a social 
dilemma ( !Hardin| [1968 ) that occurs when a number of in- 
dividuals maximize their self-interest by exploiting a public 
good, and by doing so harm their (and others') own long- 
term interest. This is but one dilemma (Frank' 2006 ) that can 
be described within the framework of Evolutionary Game 



theory (Smith 1982 Axelrod T984| Dugatkin 1997 Hof- 



[bauera nd Sigmund 1998 ; Nowak"26o6'). While the tragedy 
of the commons is important in social science and politics 
(overfishing, and the destruction of the environment in gen- 
eral come to mind), it also plays an important in role in bi- 
ology: both the evolution of virulence (Frank,, 1996) a nd the 
manipulation of a host by a group of parasites ( |Brown[fl999| l 
can be viewed as a dilemma of the public goods type. 
The public goods game is a standard of experimental eco- 



total contributed by the players is multiplied by a "synergy 
factor", and this amount is then equally distributed to the 
players in the pool, irrespective of whether they have con- 
tributed or not. A group of players fares best if all the players 
contribute so as to take maximum advantage of the synergy, 
but this behavior is vulnerable to "free-riders" that share in 
the pool but do not contribute themselves. In fact, the ratio- 
nal Nash equilibrium of the game is for all players to with- 
hold their tokens. 

It has been shown that punishment is an effective way to 
counteract defectors ( |Fehr and Gachter 2002; Fehr and Fis^ 
chbacher', '2003', 'Hammerstein' |2003t jNakamaru and Iwasa] 
T006 , Camerer and Fehr 2006,I Gurerk et al.| [2006 ; Sigmund] 
^ al., 2001; Henrich and Boyd 2001; Boy d et al.| |2003] 
Brandt et al., ,2003, ,Helbing et al.) |2010) . Because punish- 
ment involves an additional cost to the co-operators that al- 
ready invest into the public good ( Yamagishi} [T9 86 ; Fehr] 
2004[ Colman 2006[l, t hese cooperators (termed "moralists" 
by Helbing et al. 20I0| are themselves vulnerable to the in- 
vasion of non-punishing cooperators called "secondary free- 
riders". As a consequence, we might expect that moralists 
ultimately become extinct, either because they were outcom- 
peted by defectors, or by cooperating free-riders who benefit 
from the punishment without the associated cost. Alterna- 
tively, if moralists are ultimately successful in eliminating 
defectors, the punishment gene stops to be under selection 
and should drift, again resulting in the demise of moralists. 

It has recently been shown that, instead, in simple spatial 
games, moralist can win direct competitions ( [Helbing et al.j 
2010 1 if the environmental conditions are favorable, namely 
if the cost to benefit ratio of punishment favors moralists 
over defectors. Spatial games, where the offspring of suc- 
cessful strategies are placed near the parent, and where as 
a consequence strategies are more likely to play against kin 



strategies, give rise to spatial reciprocity (Sigmund et al 



nomics ( Davis and Holt 1993 Ledyard 1995 1, where play- 
ers possess a number of tokens that they can contribute to a 
common pool (the "investment" into the pubhc good). The 



200I[ l. This appears to be the advantage that moralists need 
to gain superiority. In the simulations of Helbing et al., evo- 
lution proceeded by the imitation of successful neighboring 
strategies rather than Darwinian evolution, but the dynamics 
are similar. However, because strategies in those simulations 



are deterministic (limiting genetic space to four genotypes), 
large grids had to be used in order to prevent premature ex- 
tinctions. 

Here, we show that spatial reciprocity is in fact not a nec- 
essary condition for the evolution of cooperation via punish- 
ment and the dominance of moralists, if stochastic strategies 
can evolve via Darwinian dynamics in a framework where 
decisions are encoded within genes that adapt to their en- 
vironment. There are conditions where cooperation evolves 
even without punishment, but absent those, punishment can 
promote the evolution of cooperation, as long as punishment 
is effective and cheap, in well-mixed populations. If coop- 
eration becomes so dominant that defectors are brought to 
extinction, the punishment gene drifts to neutrality. Finally, 
we also observe that stable environments that are believed 
to be more predictable for players also increase the chance 
for cooperators to evolve and to be stable, as observed ear- 
lier within the iterated Prisoner's Dilemma ( jlliopoulos et al. 
|20T0l ). 

Experimental Design 

We evolve stochastic strategies playing the public goods 
game with punishment. Each individual in a group of k 
players (fc = 5 in the present implementation) can decide 
to cooperate by making a contribution of 1 unit to the public 
good, while defecting individuals do not contribute. We en- 
code this choice as a probability pc, which can be thought 
of as the outcome of a network of genes that encode this 
decision. When mutating strategies, instead of mutating the 
individual genes that make up the decision pathway, we sim- 
ply replace the parental probability pc by a uniformly drawn 
random number in the offspring. We will call the locus en- 
coding the probability pc simply the "C gene". 

The sum of all contributions from cooperating players is 
multiplied by r (the synergy factor) and divided among all 
players. In addition, each player has the option to punish 
players who do not contribute. This decision is encoded by 
an independent probability pp, called the "P gene". Fol- 
lowing Helbing et al. |2010[ those players who defect suffer 
a fine /3/k levied by the punishers in the group, whereas 
the punishers incur a penalty of j/k. At each update, every 
player engages in a game with all its assigned opponents. 
The number of cooperators Nq, defectors Njj, moralists 
Nm and immoralists (players who defect but also pun- 



Moralists receive 



ish Helbing et al. (20I0i) Nj is computed, and the payoff 



is assigned as follows: A cooperator receives 
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The population consists of 1,024 individuals who each 
have four assigned opponents. Since all opponents are also 
players, each individual plays five games per update. The 
choices of each individual are determined by their prob- 
abilities to cooperate pc and to punish pp. After each 
round, 2 percent of the population is replaced using a Moran- 
process ( |Moran| |1962j ) in a well-mixed fashion, that is, the 
identity of the players in the group is unrelated to their an- 
cestry so that, effectively, the members of a particular play- 
ing group are randomly selected from the population. We 
verified that the probability for a player to encounter co- 
operators is independent of whether that player is a coop- 
erator or a defector, as is required for well-mixed popula- 



tions (Fletcher and Doebeli 2009 1. Players that are not re- 
placed are allowed to accumulate their score, which is used 
to calculate the probability that this player's strategy will be 
chosen to replicate and fill the spot of a player that was re- 
moved in the Moran process. 

Every individual's genes mutates with a probability ji 
when replicated. As mentioned earlier, the mutation of a 
gene replaces the probability with a uniformly distributed 
random number. After 500,000 updates, the line of descent 
(LOD) of the population is reconstructed, by picking a ran- 
dom organism of the final population and following its an- 
cestry all the way back to the starting organism, which has 
Pc ~ 0.5 and pp — 0.5. Because there is only one species 
in these populations, the LOD of the population coalesces to 
a single LOD (which is why it is sufficient to pick a random 
genotype for following the LOD). 

As the strategies adapt to the environmental conditions 
(specified by the parameters that define the game, as well 
as the spatial properties, the mutation rate, and the replace- 
ment rate), the probabilities that appear on the LOD tell the 
story of that adaptation, mutation by mutation. While the 
LOD in each particular run can show probabilities varying 
wildly, averaging many such LODs can tell us about the se- 
lective pressures the populations face. In particular, aver- 
aging the probabilities on the LODs after they have settled 
down (from the transient beginning at the random strategy 
{pC;Pp) — (0.5,0.5)) can tell us the fixed point of evolu- 
tionary adaptation ( jlliopoulos et al. 2010| l. We determine 
this fixed point by discarding the first 250,000 updates of 
every run (the transient), along with the last 50,000 (in order 
to remove the dependence of the LOD on the randomly cho- 
sen anchor genotype) and averaging the remaining 200,000 
updates. Note that this fixed point is a computational fixed 



point only: we do not mean to imply that the population's 
genotypes all end up on this exact point. Rather, due to the 
nature of the game, the evolutionary trajectories approach 
this point and then fluctuate around or near it. Thus, the 
fixed point reflects the mean successful strategy given the 
conditions of the game. 

Results 

When mapping the possible parameters /3 (effectiveness) 
and 7 (cost) each in the range from 0.0 to 1.0 and at low 
synergy r = 3.0, we find that defection is the most prevalent 
strategy on the LOD (see Figu res [T^ and b), as was found 
previously ( [Brandt et al.l[2003|[Helbing et al.|[20T0| l. When 
/3 and 7 vanish, punishment has no effect, nor is there a cost 
associated with that punishment. At this point, the P gene is 
not under selection and drifts. A drifting gene can be recog- 
nized by a mean of 0.5 and a variance of 1/12 sa 0.083 at 
the fixed point, as expected for the average and variance of a 
uniform random number on the interval (0,1). Thus, for this 
value of synergy (and lower), we find that the strategy fixed 
point is defection without punishment, except for the values 
7=/?=0, where punishment is random. 

As the degree of synergy increases to r = 4, cooperation 
starts to appear even in this well-mixed population (while it 
appears as early as r = 2 for sufficiently high /? and low 7 in 



the spatial version of the game, see Brandt et al. 2003 He! 



bing et al. 2010 1. We find players cooperating (pc ~ 0.8) at 
high /3 and low 7 (see Figure|2^), which indicates that under 
conditions where punishment is not very costly or even free, 
punishment pays off. In addition we notice that the probabil- 
ity to punish increases under the same conditions that allows 
cooperation (high (3 and low 7, that is high impact, low cost 
of punishment), indicating that punishment is indeed used to 
enforce cooperation (Fig.|2]3). The mean punishment proba- 
bility grows to 0.5, but at the same time the variance shows 
that this gene is not under drift (data not shown). Still, the 
distribution of probabilities on the LOD is fairly broad, indi- 
cating that periods of strong punishment give way to periods 
where agents are much more forgiving. Thus, it appears that 
punishment under these conditions is effective even if it is 
engaged in only intermittently. 

Increasing the synergy level even higher towards 
r=4.5 shows the emergence of dominance of cooperation 
(pc >0.5) for most of the range of punishment cost and 
effectiveness, see Figure [3^. At the same time the punish- 
ment probability reaches 0.5 for a larger range of parameters 
(Fig. [3j)), but the mean punishment probability on the LOD 
never exceeds 0.5, implying that full persistent punishment 
is not stable, and probably not necessary. Increasing syn- 
ergy to r = 5 reveals a population that engages in coopera- 
tion for almost all parameter settings (see Figure |4]|, even at 
conditions where punishment is costly without much impact 
(/3 < 0.5, 7 > 0.5) but the variance suggests that at high 
punishment effect and low cost, this gene may be drifting 
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Figure 1 : (a) Mean probabilities for cooperation pc at the 
evolutionary fixed point. Despite the different grey scales, 
the probability to cooperate vanishes except for noise, (b) 
The probability to punish pp at the average fixed. For both 
genes, f3 and 7 range from 0.0 to 1.0 in increments of pc, at 
a fixed r = 3. We used a mutation rate /i = 0.02 per gene in 
this and the following three figures. Here, we averaged two 
replicates for each set of parameters. 



(as it is only selected for if defectors are prominent). This 
outcome is expected because at r = 5, the cooperators' pay- 
off is equal to or higher than the defectors, and exactly equal 
in the absence of punishment (see below). Thus, defectors 
should disappear and punishment become random. 

Note that, in an implementation where decisions are de- 
terministic (such as in the implementation of Helbing et 
al.,2010), punishment may remain for a long time in the 
population even though it is not selected anymore. In fact, 
Helbing et al. (2010) increased their population sizes pre- 
cisely because they observed the disappearance of punish- 
ment in smaller cooperating population. From what we have 
observed here, this disappearance is due entirely to neutral 
drift. 

Critical Behavior 

Previously, a phase transition between cooperative and de- 
fective behaviour in the public goods game was observed for 



the spatial version ( |Szabo and Hauert, |2002, ,Brandt et al. 
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Figure 2: Mean probabilities for pc (a) andpp (b) averaged 
over the latter part of the LOD (average fixed point), for /3 
and 7 ranging from 0.0 to 1.0, in increments of 0.2, at r=4 
(5 replicates per data point). 




Figure 3: Mean probabilities for pc (a) and pp (b) at the 
evolutionary fixed point, for (3 and 7 ranging from 0.0 to 1.0 
in increments of 0.2, at r=4.5. 15 replicates per data point. 



2003 1 of the game (but not the well-mixed version). In Fig.|5] 
we show the mean probability at the evolutionary fixed point 
of both the C gene (black lines) and the P gene (grey lines) as 
a function of the synergy level r, for different mutation rates 
(dash-dotted lines: fj, — 0.001, dashed lines: /i — 0.01 and 
solid lines: /i = 0.02, which is the mutation rate we used in 
Figs 1-4). We note the sudden emergence of cooperation at a 
critical synergy level, but that this level depends on the mu- 
tation rate. For the highest mutation rate (black solid line in 
Fig.|5]l cooperation emerges the earliest, however the critical 
point (defined as the point where the cooperation probabil- 
ity reaches 0.5) moves towards higher synergy levels. As the 
mutation rate is lowered, the critical point moves to the left 
and the fixed point probability is higher The emergence of 
punishment (grey lines in Fig.|5]) follows the same trend, and 
again we notice that the mean never exceeds 0.5. Thus, we 
see that higher mutation rates lead to higher critical synergy 
values necessary to enable cooperation, in other words, the 
more uncertain environments are discouraging for coopera- 
tors as observed earlier (Ihopoulos et al. 2010| l. 

It is instructive to study how punishment affects the crit- 
ical point. To do this, we ran a control of the experiment 



where punishment did not exist. In that case, we observe 
a critical r that is significantly higher that what we observe 
with punishment (see Fig. [6]l, showing again how punish- 
ment aids in the establishment of cooperation. Note also that 
the levels of cooperation achieved are significantly higher 
when punishment exists. 

We can calculate approximately the point at which co- 
operation is favored in a mean-field approach that does not 
take mutation and evolution into account, by writing Eqs. (jT|- 
[2]) in terms of the density of cooperators pc encountered 
by players in a group. Both naked cooperators and pun- 
ishing cooperators (moralists) contribute to this density, i.e., 
Pc — {Nq + Nm)/N, where N is the total number of play- 
ers in the group. We can also introduce the mean density of 
punishers pp — {Nm + Nj)/N encountered by a player. 
Because the mean density of cooperators and punishers is 
the same for both cooperators and defectors in a well-mixed 
scenario (but not for spatial play!), we can then write 
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Figure 4: Mean probabilities for pc (a) and pp (b) at the 
evolutionary fixed point, for f3 and 7 ranging from 0.0 to 1.0 
in increments of 0.2, at r=5 (five replicates per data point). 



and 



Pd = r-r—r - ppp 



fc + 1 

and we expect cooperation to be favored if 
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This equation implies that the emergence of cooperation 
depends crucially on the density of punishers. In fact, the 
mean-field theory predicts that cooperation in the absence of 
punishment emerges only at r = 5, while we see it emerge 
quite a bit earlier than that (see Fig.[6j dashed lines). Note, 
however, that the critical point moves towards the predicted 
value r = 5 as the mutation rate is lowered, which would 
not be surprising as the theory holds strictly only for van- 
ishing mutation rate. Because we expect that the density 
of punishers increases as the mutation rate increases (be- 
cause mutations can introduce defectors at an elevated rate, 
necessitating a more pronounced punishment response), we 



Figure 5: Mean probability of cooperation pc (black lines) 
and punishment pp (grey lines) at the evolutionary fixed 
point of the trajectory, as a function of the synergy r for three 
different mutation rates: dash-dotted: /i — 0.001, dashed: 
IJ, = 0.01, and solid: fj, = 0.02 (100 replicates for each data 
point). 



can also expect the critical mutation rate to drop commen- 
surately, but it is clear from the previous comment that there 
are mutation rate effects in the dynamics of the population 
that are independent of punishment. 

Because of the critical importance of punishers in deter- 
mining the synergy level at which cooperation emerges, the 
public goods game with a genetic basis implies a curious dy- 
namics close to the critical point. Below the critical point, 
defection is a stable strategy, and punishment is absent. Only 
when cooperation emerges as a possibility, punishment be- 
comes more and more important, leading to a lowering of 
the critical synergy for cooperation via Eq. (j8]l. Thus, coop- 
eration emerges rapidly and decisively once a critical level 
has been achieved. Once cooperation is dominant and de- 
fectors are all but driven to extinction, punishment becomes 
irrelevant and the gene begins to drift. As this happens, 
the fraction of punishers drops, raising the critical synergy. 
Thus, a drifting punishment gene can lead to the sudden re- 
emergence of defectors as stable states. Once those have 
taken over, the reverse dynamics begins to unfold. In other 
words, we should observe periods of cooperation and de- 
fection follow each other closely as the synergy is near the 
critical point. 

This dynamics is reminiscent of the phenomenon of su- 
percooling and superheating in phase transitions. If we 
imagine the synergy parameter r as the critical parameter 
and the mean probability to cooperate as the order parame- 
ter, it is possible that when r is slowly increased, the pop- 
ulation remains in the defecting phase because a switch to 
cooperation requires a critical number of cooperators as a 
"seed". In such a situation, the defecting phase is unstable 




Figure 6: Mean probabilities for pc at the fixed point, for 
effectiveness /? = 0.8 and cost of punishment 7 = 0.2, as a 
function of synergy r. Solid lines are the standard protocol, 
while the dashed lines represent experiments with punish- 
ment turned off (pp = 0). Black: ji = 0.02, grey: /i = 0.01, 
20 replicates per curve 



to fluctuations. If a critical number of cooperators emerge by 
chance, punishment immediately becomes effective, lowers 
the critical point as implied by Eq. ([8]l, and the population 
could transition to cooperation very quickly. The shape of 
the critical curves in Fig. |5] supports this point of view: at 
higher mutation rates, the transition from defection to coop- 
eration as the synergy r is increased is much more gradual, 
presumably because the increased mutation rate increases 
the probability to create the seed of cooperators that is nec- 
essary for the emergence of the cooperative phase. An in- 
vestigation of the population dynamics at the critical point 
will be the subject of a subsequent investigation. 

Discussion 

We studied Darwinian evolution of stochastic strategies in 
the public goods game for well-mixed populations, using 
genes that encode the probabilities for cooperation and pun- 
ishment. It is known that punishment can drive the evolution 
of cooperation above a critical synergy level as long as there 
is a spatial structure in the environment ( [Brandt et al.j |2003t 



Helbing et al. 2010). It was also previously believed that in 



well-mixed populations cooperation can only become suc- 



cessful if additional factors like reputation ( Sigmund et al. 
|2001 ) are influencing the evolution. Here we show that 
cooperation readily emerges in a well-mixed environment 
above a critical level of synergy. This critical level is influ- 
enced by a number of factors, such as the rate of punishment 
and the mutation rate. 

If the conditions for punishment are good (that is, the cost 
for punishment is low and the effect is high) we find cooper- 
ative strategies that also have elevated probabilities to pun- 



ish, that is, they are moralists. But if punishment is cheap 
and effective, we also see that defectors practically vanish, 
which in turn obviates the need for punishment, so much so 
that the punishment gene begins to drift. This effect, how- 
ever, is also mutation rate dependent, because higher muta- 
tion rates will automatically create a higher influx of defec- 
tors even if they cannot be maintained by selection. 

We conclude that in well-mixed populations cooperation 
can emerge if the synergy outweighs the defectors' reward. 
If the mutation rate is low enough, the loss of defectors 
makes punishment obsolete, that is, the selective pressure 
to punish disappears. Naturally, once this has occurred de- 
fectors can again gain a foothold, and the balance of power 
between cooperators and defectors could shift. Such a shift, 
however, reinstates the selective pressure to punish, leading 
to a re-emergence of moralists that can drive defectors out 
once more. Thus, for synergy factors near the critical point, 
we can expect oscillations between cooperators and defec- 
tors, and no strategy is ever stable ( [Hintze et al.||2010| . 
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